Skip to main content

Framework

Model

Trainer

class eole.utils.Statistics(loss=0, auxloss=0, n_batchs=0, n_sents=0, n_tokens=0, n_correct=0, computed_metrics=None, data_stats=None, attention_entropy=0, n_attention_samples=0)

Bases: object

Accumulator for loss statistics. Currently calculates:

  • accuracy
  • perplexity
  • elapsed time

accuracy()

compute accuracy

static all_gather_stats(stat, max_size=4096)

Gather a Statistics object accross multiple process/nodes

  • Parameters:
    • stat**(** – obj:Statistics): the statistics object to gather accross all processes/nodes
    • max_size (int) – max buffer size to use
  • Returns: Statistics, the update stats object

static all_gather_stats_list(stat_list, max_size=4096)

Gather a Statistics list accross all processes/nodes

  • Parameters:
    • stat_list (list([Statistics])) – list of statistics objects to gather accross all processes/nodes
    • max_size (int) – max buffer size to use
  • Returns: list of updated stats
  • Return type: our_stats(list([Statistics]))

avg_attention_entropy()

compute average attention entropy

computed_metric(metric)

check if metric(TER/BLEU) is computed and return it

elapsed_time()

compute elapsed time

log_tensorboard(prefix, writer, learning_rate, patience, step)

display statistics to tensorboard

output(step, num_steps, learning_rate, start)

Write out statistics to stdout.

  • Parameters:
    • step (int) – current step
    • n_batch (int) – total batches
    • start (int) – start time of step.

ppl()

compute perplexity

update(stat, update_n_src_tokens=False)

Update statistics by suming values with another Statistics object

  • Parameters:
    • stat – another statistic object
    • update_n_src_tokens (bool) – whether to update (sum) n_src_tokens or not

xent()

compute cross entropy

Loss

Optimizer

class eole.utils.Optimizer(optimizer, learning_rate, learning_rate_decay_fn=None, max_grad_norm=None, use_amp=True)

Bases: object

Controller class for optimization. Mostly a thin wrapper for optim, but also useful for implementing rate scheduling beyond what is currently available. Also implements necessary methods for training RNNs such as grad manipulations.

  • Parameters:
    • optimizer – A torch.optim.Optimizer instance.
    • learning_rate – The initial learning rate.
    • learning_rate_decay_fn – An optional callable taking the current step as argument and return a learning rate scaling factor.
    • max_grad_norm – Clip gradients to this global norm.

property amp

True if use torch amp mix precision training.

backward(loss)

Wrapper for backward pass. Some optimizer requires ownership of the backward pass.

classmethod from_config(model, config, checkpoint=None)

Builds the optimizer from options.

  • Parameters:
    • cls – The Optimizer class to instantiate.
    • model – The model to optimize.
    • opt – The dict of user options.
    • checkpoint – An optional checkpoint to load states from.
  • Returns: An Optimizer instance.

learning_rate(step=None)

Returns the current learning rate.

step()

Update the model parameters based on current gradients.

Optionally, will employ gradient modification or update learning rate.

property training_step

The current training step.

zero_grad(set_to_none=True)

Zero the gradients of optimized parameters.

class eole.utils.AdaFactor(params, lr=None, beta1=0.9, beta2=0.999, eps1=1e-30, eps2=0.001, cliping_threshold=1, non_constant_decay=True, enable_factorization=True, ams_grad=True, weight_decay=0)

Bases: Optimizer

step(closure=None)

Perform a single optimization step to update parameter.

  • Parameters: closure (Callable) – A closure that reevaluates the model and returns the loss. Optional for most optimizers.